UC Berkeley Library
Statistics Undergraduate Student Association
Spring 2018
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.”
-John Tukey
| Unit of Analysis | Geography | Time-Period | Frequency |
|---|---|---|---|
| Aggregated or Microdata? (counties/nations/households vs. individuals) | Is there a geographic component to your topic? (U.S., Sub-Saharan Africa, India) | Do you want a data for a specific time period? (1980-2000, 1930-1960) | How often do you want measures for your variables? (every year, every ten years, monthly, quarterly) |
| Researchers | Government Agencies | NGO/IGOs | Research Organizations |
|---|---|---|---|
| Are there people you know who are doing this kind of research? | Think about government agencies - is the request for some official statistics or data that they’d be likely to collect and publish? (Department of Energy, CDC, Census Bureau) | Are there councils or interest organizations devoted to the topic that might collect data independently? (World Bank, OECD) | Would any specific research organizations be interested in the topic? (Pew, Roper, Gallup, ACLU ) |
https://libraries.mit.edu/scholarly/publishing/apis-for-scholarly-resources/
https://en.wikipedia.org/wiki/UFO_sightings_in_the_United_States
library(rvest)
library(dplyr)
ufo <- read_html("https://en.wikipedia.org/wiki/UFO_sightings_in_the_United_States")
ufo_date <- html_nodes(ufo,'td:nth-child(1)') %>% html_text()
ufo_date <- ufo_date[c(-1, -44)] #remove extra elements
ufo_state <- html_nodes(ufo,'td:nth-child(3)') %>% html_text()
ufo_name <- html_nodes(ufo,'td:nth-child(4)') %>% html_text()
ufo_df<-data.frame(date = ufo_date, name = ufo_name, state = ufo_state)
head(ufo_df, n =5)## date name state
## 1 \nIndex of ufology articles\n Cooper St. UFO crash New York
## 2 April 1997 Battle of Los Angeles California
## 3 February 24, 1942 Maury Island incident Washington
## 4 June 21, 1947 Kenneth Arnold UFO sighting Washington
## 5 June 24, 1947 Montana
https://vincentarelbundock.github.io/Rdatasets/datasets.html
Up to $5000 for one-time purchases for research projects
Library will host the data and handle the licensing
Terms and to apply: http://guides.lib.berkeley.edu/data
“Research Data Management helps researchers navigate the increasingly complex landscape of data planning, storage, and sharing”
Peer Consulting in collaboration with Division of Data Sciences